This repository contains the data and results to reproduce the paper by Nguyen et al., submitted to Water Resources Research. This vignette shows the step-by-step computations to reproduce the paper’s results and figures. We also provide some additional details that may be of interest to some readers.
We provide the data in the folder data/. Unfortunately, we can’t give you the instrumental streamflow data for the Yangtze, Mekong, and Pearl Rivers, due to restrictions. Therefore, if you run the code here, you will not get results for these rivers for some computations that require instrumental data. We provide all results in the folder results/, including those for these rivers.
R scripts that contain the intermediate computational steps and utility functions are provided in the folder R/. You may want to check them out before we proceed.
We start by loading the necessary packages and utility functions (if you don’t already have some packages, please install them first).
library(ldsr) # Streamflow reconstruction
library(data.table) # Data wrangling
library(ggplot2) # Plotting
library(patchwork) # Arranging plots
library(cowplot) # Arranging and annotating plots
library(magrittr) # Piping
library(VSURF) # Input variable selection
library(sf) # Mapping
library(geosphere) # Geodetic distance calculation
library(foreach) # Parallel computations
library(doFuture) # Parallel computations
library(future) # Parallel computations
source('R/geo_functions.R') # Geographical processing functions
source('R/correlation_functions.R') # Correlation tools
source('R/utils.R') # Other utilities
options(digits = 4) # For concise printingLet’s read the main data.
instQmeta <- fread('data/instQ_meta.csv', key = 'ID') # Streamflow metadata
instQ <- fread('data/instQ.csv', key = 'ID') # Instrumental streamflow
mada2mat <- readRDS('data/mada2mat.RDS') # MADA v2
mada2xy <- readRDS('data/mada2xy.RDS') # Coordinates of MADA v2 grid pointsNow let’s explore the data sets.
Table S2.
Note. In the paper, we omitted the leading zeros in station IDs due to space constraints. Here we are using the full station IDs. The column code is an encoding for plotting, which will be useful in Figure 5.
Figure 1b.
We have calculated the total upstream reservoir capacity of each station from QGIS and stored in the file data/resVol.csv. We have also calculated the mean annual flow, converted it to million m\(^3\)/year, and stored in the file data/instQmean.csv. Now, we calculate upstream resident time as the ratio between the total upstream reservoir capacity and the mean annual flow.
resVol <- fread('data/resVol.csv')
instQmean <- fread('data/instQmean.csv')
resFrac <- resVol[instQmean, on = 'ID', nomatch = NULL, # Merge
][, frac := resVol / Qm # Calculate ratio
][order(frac, decreasing = TRUE) # Sort by frac
][, ID := factor(ID, levels = ID)] # To maintain plot order
ggplot(resFrac, aes(ID, frac)) +
geom_bar(fill = blues9[6], stat = 'identity') +
scale_x_discrete(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0), breaks = seq(0, 1, by = 0.25), limits = c(0, 1)) +
labs(x = NULL, y = 'Upstream retention time [years]') +
theme_classic() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1, size = 8),
axis.text.y = element_text(hjust = 0),
axis.ticks = element_blank(),
axis.line = element_blank(),
panel.background = element_blank(),
panel.grid.major.y = element_line(colour = 'white'),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
panel.ontop = TRUE)Figure S3, without the Yangtze, Mekong, and Pearl.
Rescale the untransformed and log-transformed flow, then compare the two densities.
transQ <- instQ[, .(None = standardize(Qa), Log = standardize(log(Qa))), by = ID]
ggplot(transQ) +
stat_density(aes(x = None, colour = 'No transformation'),
position = 'identity', geom = 'line', na.rm = TRUE, size = 0.5) +
stat_density(aes(x = Log, colour = 'Log-tranformed'),
position = 'identity', geom = 'line', na.rm = TRUE, size = 0.5) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0)) +
scale_colour_manual(name = NULL,
values = c('steelblue', 'darkorange')) +
facet_wrap(vars(ID), scales = 'free_y', ncol = 6) +
labs(x = 'z-score', y = 'Density') +
theme_classic() +
theme(axis.text = element_blank(),
axis.ticks = element_blank(),
axis.line = element_blank(),
strip.background = element_blank(),
legend.position = 'top',
legend.key.width = unit(1, 'cm'))In case you are not familiar with the MADA, below is a plot of the grid. For background map, we also provide the coastlines of the Monsoon Asia region in the file data/mada-coastline.gpkg. We use the package sf to read this file.
bgMap <- sf::st_read('data/mada-coastline.gpkg', quiet = TRUE)
ggplot(mada2xy) +
geom_point(aes(long, lat), shape = '+', colour = 'steelblue') +
geom_sf(data = bgMap) +
labs(x = NULL, y = NULL) +
coord_sf(expand = FALSE) +
theme_bw() +
theme(panel.grid = element_blank())All MADA grid points end in 2012. We now plot their starting years. This is Figure S4.
mada2startYear <- readRDS('data/mada2start.RDS')
ggplot(mada2startYear) +
geom_raster(aes(long, lat, fill = year)) +
scale_fill_binned(name = 'First year of record', type = 'viridis') +
scale_x_continuous(expand = c(0, 0), labels = pasteLong) +
scale_y_continuous(expand = c(0, 0), labels = pasteLat) +
labs(x = NULL, y = NULL) +
coord_quickmap() +
theme_bw() +
theme(panel.grid = element_blank())Most grid points start before 1200. A few grid points that start after 1200 are ignored.
The main idea is to select MADA grid points that are in a similar climate to the streamflow station of interest. We characterize climate using the KWF hydroclimate system.
We provide the hydroclimate classification system in the file data/kwf.RDS, only for the Monsoon Asia domain.
long and lat are the coordinates of the grid point. x1, x2, y1, y2 are the four corners of the grid cell, which will be used to determine the cell that contains any point on Earth. arid, seas, and snow are the three KWF indices: aridity, seasonality, and snow fraction. col is the RGB colour created from the three indices.
Let’s visualize this data set.
ggplot(kwf) +
geom_raster(aes(long, lat, fill = I(col))) +
labs(x = NULL, y = NULL) +
scale_x_continuous(labels = pasteLong) +
scale_y_continuous(labels = pasteLat) +
coord_quickmap(expand = FALSE) +
theme_bw() +
theme(panel.grid = element_blank())With this, we can identify the KWF cell of each MADA grid cell, so as to determine its climate. The MADA v2 has resolution \(1^\circ \times 1^\circ\) and the KWF has resolution \(0.5^\circ \times 0.5^\circ\). The MADA grid lies nicely on the KWF grid, so all we need to do is a simple left joint.
madaKwfCells <- mada2xy[kwf, on = c('long', 'lat'), nomatch = NULL
][, .(point, long, lat, arid, seas, snow)]
head(madaKwfCells)From here on, we’ll need to use lots of parallel computing, so let’s set this up.
Now we can select MADA grid points based on the KWF distance, using get_mada_by_kwf(). This will take a few seconds on a normal desktop.
kwfRange <- seq(0.1, 0.3, 0.05)
madaPoints <- foreach(s = split(instQmeta, by = 'ID')) %dopar%
lapply(kwfRange, function(kwfMax)
get_mada_by_kwf(c(s$long, s$lat), madaKwfCells, kwf, kwfMax, 2500))Alternatively, you can read the pre-computed results.
Let’s now calculate the correlation between streamflow and the MADA for the selected stations on the Krishna and Chao Phraya (as in the paper), so that we can compare the significantly correlated areas with the search area, like in Figure 3. The file R/correlation_functions.R contains a utitilty function to determine the boundary lines of significant areas.
# Correlation between PDSI and streamflow
group1 <- c('Krishna', 'Chao Phraya')
group1ID <- c('IN_0000117', 'TH_0000178')
row.names(mada2mat) <- 1200:2012
madaLong <-
mada2mat %>%
as.data.table(keep.rownames = 'year') %>%
melt(id.vars = 'year', variable.name = 'point', value.name = 'pdsi') %>%
.[, year := as.numeric(year)]
subCor <- merge(madaLong, instQ[group1ID], by = 'year')[,
{
ct <- cor.test(pdsi, Qa)
list(rho = ct$estimate, p.value = ct$p.value)
},
by = .(ID, point)
][, point := as.numeric(point) # melt() returns factor for variable so this is ok
][mada2xy, on = 'point'
][instQmeta[, .(ID, river, name, long, lat)], on = 'ID', nomatch = NULL]
setnames(subCor, c('i.long', 'i.lat'), c('Qlong', 'Qlat'))
subCor[, name := factor(name, levels = c('Karad', 'C.2'))]
subCor[, signif := p.value < 0.05]
setkey(subCor, long, lat)
subCorSignif <- subCor[, signif_area(.SD, 1, 1), by = .(ID, name)]
names(group1ID) <- group1ID
# Seach area
names(kwfNames) <- kwfNames <- c('0.10', '0.15', '0.20', '0.25', '0.30')
inputPoints <-
lapplyrbind(group1ID, function(s)
lapplyrbind(kwfNames[2:3], function(kwfMax)
data.table(point = madaPoints[[s]][[kwfMax]]),
id = 'kwf'),
id = 'ID'
)[instQmeta[, .(ID, name)], on = 'ID', nomatch = NULL
][mada2xy, on = 'point', nomatch = NULL]
# Plots
inputPoints[, kwf := paste0('d[KWF] == ', kwf)]
inputPoints$name %<>% factor(levels = c('Karad', 'C.2'))
sxy <- instQmeta[group1ID][, name := factor(name, levels = c('Karad', 'C.2'))]We’re now ready to reproduce Figure 3.
selectedPointPlot <- ggplot(inputPoints, aes(long, lat)) +
geom_tile(data = mada2xy, width = 1, height = 1, fill = 'gray') +
geom_tile(fill = '#4daf4a', width = 1, height = 1) +
geom_point(data = sxy, colour = 'red') +
scale_x_continuous(expand = c(0, 0), labels = pasteLong) +
scale_y_continuous(expand = c(0, 0), position = 'right') +
coord_quickmap() +
facet_grid(name ~ kwf, switch = 'y',
labeller = labeller(kwf = label_parsed)) +
theme_bw() +
theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.line = element_blank(),
axis.title = element_blank(),
strip.background = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(hjust = 0.5)) +
labs(title = 'Selected MADA grid points')
corPlot <- ggplot(subCor) +
geom_raster(aes(long, lat, fill = rho)) +
geom_segment(aes(x = long - 0.5, xend = long + 0.5, y = lat + 0.5, yend = lat + 0.5),
size = 0.1,
data = subCorSignif[top == TRUE]) +
geom_segment(aes(x = long - 0.5, xend = long + 0.5, y = lat - 0.5, yend = lat - 0.5),
size = 0.1,
data = subCorSignif[bottom == TRUE]) +
geom_segment(aes(x = long - 0.5, xend = long - 0.5, y = lat - 0.5, yend = lat + 0.5),
size = 0.1,
data = subCorSignif[left == TRUE]) +
geom_segment(aes(x = long + 0.5, xend = long + 0.5, y = lat - 0.5, yend = lat + 0.5),
size = 0.1,
data = subCorSignif[right == TRUE]) +
geom_point(aes(Qlong, Qlat), colour = 'red') +
scale_x_continuous(expand = c(0, 0), labels = pasteLong) +
scale_y_continuous(expand = c(0, 0), labels = pasteLat) +
scale_fill_distiller(name = 'Correlation', palette = 'RdBu', direction = 1,
breaks = scales::pretty_breaks(3), limits = absRange(subCor$rho)) +
coord_quickmap() +
labs(x = NULL, y = NULL) +
facet_wrap(~name, ncol = 1, strip.position = 'right') +
theme_bw() +
theme(panel.grid = element_blank(),
axis.line = element_blank(),
legend.position = 'top',
legend.key.width = unit(0.6, 'cm'),
strip.background = element_blank())
s2 <- plot_grid(selectedPointPlot, corPlot,
ncol = 2, align = 'hv', axis = 'trbl', rel_widths = c(1.75, 1),
labels = c('a)', 'b)'), label_size = 10)
s2a <- ggdraw(s2) +
draw_label('(Krishna)', 0.62, 0.650, size = 10, angle = 90) +
draw_label('(Chao Phraya)', 0.62, 0.250, size = 10, angle = 90)
s2aWe need to calculate the correlation between each grid cell and each station. The result is a correlation matrix, each row is a station and each column is a MADA grid point.
corMat <-
instQ[,
{
ind <- which(1200:2012 %in% year)
as.data.frame(cor(Qa, mada2mat[ind, ], use = 'complete.obs'))
},
keyby = ID] %>%
as.matrix(rownames = TRUE)We also need to setup the p parameter, and do some small preparations.
pRange <- c('0' = 0, '0.5' = 0.5, '2/3' = 2/3, '1' = 1, '1.5' = 1.5, '2' = 2)
stationIDs <- instQmeta[ID %in% instQ$ID, ID]
# Set names so that we can name list elements conveniently later
names(pNames) <- pNames <- names(pRange)
names(stationIDs) <- stationIDs pca <-
foreach(s = stationIDs,
.final = function(x) setNames(x, stationIDs)) %dopar% {
lapply(kwfNames, function(kwfMax) {
points <- madaPoints[[s]][[kwfMax]]
X <- mada2mat[, points]
rho <- corMat[s, points]
lapply(pRange, function(p) {
pcaModel <- wPCA(X, rho, p)
get_PCs(pcaModel)
})
})
}The previous step gives us a 30-member ensemble from 5 KWF distance and 6 PCA weights. You can type View(pca) to see the results. Each ensemble member contains a set of weighted PCs.
Next, use the VSURF algorithm to select a subset of principal components for each ensemble. This code will take about 20 minutes, so get yourself a drink while it runs. Or skip to the next chunk and read the pre-computed results.
ivs <-
foreach(s = stationIDs,
.packages = 'data.table',
.final = function(x) setNames(x, stationIDs)) %:%
foreach(kwfMax = kwfNames,
.final = function(x) setNames(x, kwfNames)) %:%
foreach(p = pNames,
.final = function(x) setNames(x, pNames)) %dopar% {
Qa <- instQ[s][!is.na(Qa)]
idx <- which(1200:2012 %in% Qa$year)
input_selection(pca[[s]][[kwfMax]][[p]][idx], Qa$Qa,
nvmax = 8, method = 'VSURF', parallel = FALSE)
}ivs contains the names of the selected PCs. Type View(ivs) to see its content.
Finally, we combine pca and ivs to have an ensemble of selected PCs.
ensemblePCs <-
lapply(stationIDs, function(s)
lapply(kwfNames, function(kwfMax)
lapply(pNames, function(p) {
sv <- ivs[[s]][[kwfMax]][[p]]
pca[[s]][[kwfMax]][[p]][, ..sv]
})))If you skipped the computations above, you can read the pre-computed weight PCA ensembles here.
This cross-validation step is also a tuning step. We cross-validate all ensemble members and select one that has the highest mean KGE.
We will also check ensemble averaged models where we average over all p, over all kwf distances, and over both p and kwf.
Before we start, we need to make the cross-validation folds.
set.seed(100)
cvPoints <-
instQ %>%
split(by = 'ID') %>%
lapply('[[', 'Qa') %>%
lapply(make_Z, frac = 0.25, nRuns = 30, contiguous = TRUE)We also need to determine whether streamflow needs to be log-transformed for each station.
trans <- instQ[, .(trans = ifelse(abs(hinkley(log(Qa))) < abs(hinkley(Qa)), 'log', 'none')),
by = ID]Running the above line of code now only gives you 30 stations, so we provided the precomputed one.
This code takes many hours to run, so you should only run it on a server (one that has 24-32 cores or so), or on a cluster.
ldsScores <-
foreach(s = stationIDs,
.packages = c('data.table', 'ldsr'),
.final = function(x) {
names(x) <- stationIDs
rbindlist(x, idcol = 'ID')
}) %dopar% {
Qa <- instQ[s]
Z <- cvPoints[[s]]
transform <- trans[s, trans]
obs <- if (transform == 'log') log(Qa$Qa) else Qa$Qa
mu <- mean(obs, na.rm = TRUE)
instPeriod <- which(1200:2012 %in% Qa$year)
y <- t(c(rep(NA, Qa[1, year] - 1200), # Before the instrumental period
obs - mu, # Instrumental period
rep(NA, 2012 - Qa[.N, year]))) # After the instrumental period
# Individual models
Ycv <- lapplyrbind(kwfNames, function(kwfMax)
lapplyrbind(pNames, function(p) {
u <- t(ensemblePCs[[s]][[kwfMax]][[p]])
lapplyrbind(Z, function(z)
data.table(year = Qa$year,
Q = one_lds_cv(z, instPeriod,
mu, y, u, u, 'EM', num.restarts)))
}))
colnames(Ycv)[1:3] <- c('kwf', 'p', 'rep')
# Different ensembles
ensP <- Ycv[, .(p = 'Ensemble', Q = mean(Q)), by = .(kwf, rep, year)]
ensKWF <- Ycv[, .(kwf = 'Ensemble', Q = mean(Q)), by = .(p, rep, year)]
ens <- Ycv[, .(p = 'Ensemble', kwf = 'Ensemble', Q = mean(Q)), by = .(rep, year)]
# Merge & calculate scores
Ycv <- rbind(Ycv, ensP, ensKWF, ens)
scores <- Ycv[,
{
Q <- if (transform == 'log') exp(Q) else Q
metrics <- calculate_metrics(Q, Qa$Qa, Z[[rep]])
data.table(metric = names(metrics), value = metrics)
},
by = .(kwf, p, rep)
][, .(value = mean(value)), by = .(kwf, p, metric)]
out <- list(Ycv = Ycv, scores = scores)
scores
}
ldsScores[, type := fifelse(kwf == 'Ensemble' & p == 'Ensemble',
'both',
fifelse(kwf == 'Ensemble' | p == 'Ensemble',
'one',
'none'))]
ldsScores[, p := factor(p, levels = c(pNames, 'Ensemble'))]Let’s plot the distribution of KGE
pasteP <- function(x) ifelse(x == '0', paste('p =', x), x)
ggplot(ldsScores[metric == 'KGE']) +
geom_boxplot(aes(kwf, value, colour = type),
position = position_dodge2(padding = 0.25)) +
scale_colour_manual(name = NULL, values = c('red', 'steelblue', 'darkorange'),
guide = 'none') +
scale_fill_manual(name = 'Metric space', values = c(NA, 'gray90')) +
scale_y_continuous(breaks = scales::pretty_breaks(3)) +
facet_grid(vars(p), scales = 'free_y', labeller = labeller(.rows = pasteP)) +
theme_classic() +
labs(y = 'KGE') +
theme(legend.position = 'none',
strip.background = element_blank(),
strip.text.x = element_blank(),
strip.text.y = element_text(face = 'bold', size = 8),
axis.text.y = element_text(size = 8))Averaging over the ensemble does not improve the socres significantly, and it breaks the state-streamflow relationship. So we use the best member instead of the ensemble average.
Let’s now select the best ensemble member and check the KGE value of the selected ones.
ldsChoice <- ldsScores[metric == 'KGE' & type == 'none',
.SD[value == max(value)],
by = ID,
.SDcols = c('kwf', 'p', 'value')]
setnames(ldsChoice, 'value', 'KGE')
setkey(ldsChoice, ID)
ldsChoiceNext, we do the following steps
scores <- ldsScores[ldsChoice, on = c('ID', 'kwf', 'p')
][, .(ID, metric, value)
][, dcast(.SD, ID ~ metric)
][instQmeta, on = 'ID']
scoresWe are now ready to plot the skill score distribution (Figure 4). The function to do this is provided in R/plot_metric_map.R.
source('R/plot_metric_map.R')
g1 <- metric_map(scores, 'RE', numClasses = 9, bgMap = bgMap,
dotSize = 1,
histPosition = 'bottom', histBarDirection = 'vertical')
g2 <- metric_map(scores, 'CE', numClasses = c(1, 9), bgMap = bgMap,
dotSize = 1,
histPosition = 'bottom', histBarDirection = 'vertical')
g3 <- metric_map(scores, 'KGE', numClasses = 9, bgMap = bgMap,
dotSize = 1,
histPosition = 'bottom', histBarDirection = 'vertical')
p1 <- g1[[1]] + theme(plot.tag.position = c(0.16, 0.98),
plot.margin = margin(t = 0.1, r = 0.1, b = 0.1, l = 0.1, unit = 'cm'))
p2 <- g1[[2]] + theme(plot.tag.position = c(0.16, 1.15),
plot.margin = margin(t = 1, r = 0.1, b = 0.1, l = 0.1, unit = 'cm'))
p3 <- g2[[1]] + theme(plot.tag.position = c(-0.03, 0.98),
axis.text.y = element_blank())
p4 <- g2[[2]] + theme(plot.tag.position = c(-0.03, 1.15),
plot.margin = margin(t = 1, r = 0.1, b = 0.1, l = 0.5, unit = 'cm'),
axis.text.y = element_blank(),
axis.title.y = element_blank())
p5 <- g3[[1]] + theme(plot.tag.position = c(-0.03, 0.98),
plot.margin = margin(t = 0.1, r = 0.1, b = 0.1, l = 0.5, unit = 'cm'),
axis.text.y = element_blank())
p6 <- g3[[2]] + theme(plot.tag.position = c(-0.03, 1.15),
plot.margin = margin(t = 1, r = 0.1, b = 0.1, l = 0.5, unit = 'cm'),
axis.text.y = element_blank(),
axis.title.y = element_blank())
scorePlot <- p1 + p2 + p3 + p4 + p5 + p6 +
plot_layout(byrow = FALSE, ncol = 3, heights = c(1, 0.65)) +
plot_annotation(tag_levels = 'a', tag_suffix = ')') &
theme(plot.tag = element_text(size = 12, face = 'bold'))
scorePlotWe now build the full reconstruction using the selected models. This code takes about 30 seconds, and will give you reconstruction results for all available stations. To get results for all stations, skip to the next chunk to read the pre-computed results.
ldsFit <-
foreach(s = stationIDs,
.packages = c('data.table', 'ldsr'),
.final = function(x) setNames(x, stationIDs)) %dopar% {
Qa <- instQ[s]
p <- ldsChoice[s, p]
kwfMax <- ldsChoice[s, kwf]
u <- t(ensemblePCs[[s]][[kwfMax]][[p]])
transform = trans[s, trans]
LDS_reconstruction(Qa, u, u, 1200, transform = transform, num.restarts = 20)
}We also provide the full 30-member ensemble results in the file results/ldsFitFull.RDS, so you can inspect other models as well.
Now let’s plot the reconstructed flow history (Figure 5). The plot function is provided in R/flow_history.R We need to normalize reconstructed flow with the observed mean and standard deviation, these are provided in the file data/instQsummary.csv.
source('R/flow_history.R')
instQsummary <- fread('data/instQsummary.csv', key = 'ID')
ldsRec <- lapplyrbind(ldsFit, '[[', 'rec', id = 'ID')
setkey(ldsRec, ID)
fh <- flow_history(ldsRec, instQmeta, trans,
plotGap = TRUE, stdType = 'inst', instSummary = instQsummary)
fh <- patchworkGrob(fh)
fhWrap <- ggdraw(fh)
yLine <- c(0.3539, 0.4867)
yText <- 0.375fhAnnotated <- fhWrap +
draw_line(c(0.0755, 0.1392), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.0995, 0.1408), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.1195, 0.2375), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.3700, 0.2705), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.3900, 0.3000), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.5925, 0.3262), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.6122, 0.3565), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.6361, 0.3575), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.6575, 0.5637), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.6889, 0.5666), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.7100, 0.6950), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.8145, 0.7082), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.8353, 0.7325), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.8915, 0.7392), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.9120, 0.7608), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.9358, 0.7615), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.9565, 0.8285), yLine, colour = 'steelblue', size = 0.2) +
draw_line(c(0.9805, 0.8305), yLine, colour = 'steelblue', size = 0.2) +
draw_text(paste0('(', 1:9, ')'),
c(0.090, 0.244, 0.490, 0.623, 0.674, 0.767, 0.863, 0.921, 0.965),
yText, size = 9) +
draw_text(c('a)', 'b)'), 0.03, c(0.95, 0.4), size = 11, fontface = 'bold')
fhAnnotatedZoom in to the instrumental period (Figure S5a).
flow_history(ldsRec, instQmeta, trans,
stdType = 'inst', instSummary = instQsummary,
startYear = 1950,
plotLower = FALSE, plotSegments = FALSE,
breaks = seq(1950, 2010, 10))Figure 6.
# SST data
sst <- readRDS('data/seasonal_sst.RDS')
sstxy <- fread('data/sstxy.csv', key = c('long', 'lat'))
sst <- merge(sst, sstxy, by = 'point')
sstLand <- readRDS('data/sstLand.RDS')
# Select two stations
outletStations <- c('Karad', 'C.2', 'Stung Treng', 'Datong')
outletLookup <- instQmeta[name %in% outletStations]
outletLookup[river == 'Karad', river := 'Krishna']
outlet <- ldsRec[outletLookup] %>% split(by = 'ID')
# Calculate correlations
combi <- data.table(season = c('JJA', 'SON', 'DJF', 'MAM', 'JJA', 'SON', 'DJF'),
lag = c( -1, -1, 0, 0, 0, 0, 1)) %>%
split(1:nrow(.))
sstCor <-
foreach(Q = outlet, .combine = rbind, .packages = 'data.table') %:%
foreach(case = combi, .combine = rbind) %dopar%
get_sst_cor(Q, case$season, case$lag, 'Q')
# Set factors to determine plot order
sstCor[season == 'DJF (-1)', season := 'DJF (-2)']
sstCor[season == 'DJF' , season := 'DJF (-1)']
sstCor[season == 'DJF (+1)', season := 'DJF']
sstCor$season %<>% factor(levels = c('JJA (-1)', 'SON (-1)', 'DJF (-1)',
'MAM', 'JJA', 'SON',
'DJF'))
sstCor$river %<>% factor(levels = c('Krishna', 'Chao Phraya', 'Mekong', 'Yangtze'))
sstCor[, signif := p.value < 0.05]
setkey(sstCor, long, lat)
sstCor <- sstCor[sstxy, on = c('long', 'lat')]
sstSignifArea <- sstCor[, signif_area(.SD, 2, 2), by = .(season, river)]sstCorPlot <- plot_sst_cor(sstCor, sstSignifArea, sstLand) +
theme(legend.position = 'right',
legend.key.width = unit(0.4, 'cm'),
legend.key.height = unit(1, 'cm'),
strip.text.y = element_text(size = 9),
strip.background.x = element_blank(),
axis.line = element_blank())
ggdraw(sstCorPlot + theme(plot.margin = margin(l = 0.7, r = 0.2, unit = 'cm'))) +
draw_label('Decaying ENSO', x = 0.02, y = 0.75, angle = 90, size = 10) +
draw_label('Ongoing ENSO', x = 0.02, y = 0.3, angle = 90, size = 10) +
draw_text(paste0(letters[1:4], ')'), c(0.074, 0.275, 0.485, 0.69), 0.965,
fontface = 'bold', size = 11)Figure S6, but without instrumental data for the Mekong and Yangtze.
# Instrumental data
outletInst <- instQ[outletLookup][year %in% 1955:2004] %>% split(by = 'ID')
sstCorInst <-
foreach(Q = outletInst, .combine = rbind, .packages = 'data.table') %:%
foreach(case = combi, .combine = rbind) %dopar%
get_sst_cor(Q, case$season, case$lag, 'Qa')
# Set factors to determine plot order
sstCorInst[season == 'DJF (-1)', season := 'DJF (-2)']
sstCorInst[season == 'DJF' , season := 'DJF (-1)']
sstCorInst[season == 'DJF (+1)', season := 'DJF']
sstCorInst$season %<>% factor(levels = c('JJA (-1)', 'SON (-1)', 'DJF (-1)',
'MAM', 'JJA', 'SON',
'DJF'))
sstCorInst$river %<>% factor(levels = c('Krishna', 'Chao Phraya', 'Mekong', 'Yangtze'))
sstCorInst[, signif := p.value < 0.05]
setkey(sstCorInst, long, lat)
sstCorInst <- sstCorInst[sstxy, on = c('long', 'lat')]
sstSignifAreaInst <- sstCorInst[, signif_area(.SD, 2, 2), by = .(season, river)]
# Reconstruction, breaking into periods
sstCor <-
foreach(Q = outlet, .combine = rbind, .packages = 'data.table') %:%
foreach(case = combi, .combine = rbind) %:%
foreach(startYear = c(1855, 1905, 1955), .combine = rbind) %dopar% {
ans <- get_sst_cor(Q[between(year, startYear, startYear + 49)],
case$season,
case$lag,
'Q')
ans[, period := startYear][]
}
# Set factors to determine plot order
sstCor[season == 'DJF (-1)', season := 'DJF (-2)']
sstCor[season == 'DJF' , season := 'DJF (-1)']
sstCor[season == 'DJF (+1)', season := 'DJF']
sstCor$season %<>% factor(levels = c('JJA (-1)', 'SON (-1)', 'DJF (-1)',
'MAM', 'JJA', 'SON',
'DJF'))
sstCor$river %<>% factor(levels = c('Krishna', 'Chao Phraya', 'Mekong', 'Yangtze'))
sstCor[, signif := p.value < 0.05]
setkey(sstCor, long, lat)
sstCor <- sstCor[sstxy, on = c('long', 'lat')]
sstSignifArea <- sstCor[, signif_area(.SD, 2, 2), by = .(season, river, period)]periodLab <- function(x, dx = 49) paste0('Reconstruction, ', x, '\u2013', x + dx)
limits <- absRange(sstCor$cor)
sstCorPlot <-
plot_sst_cor(sstCorInst, sstSignifAreaInst, sstLand, limits = limits) +
labs(title = 'Instrumental streamflow, 1955–2004') +
theme(legend.position = 'bottom',
legend.key.width = unit(2, 'cm'),
strip.text.y = element_text(size = 9),
strip.text.x = element_text(size = 11),
strip.background.x = element_blank(),
axis.line = element_blank())
legend <- get_legend(sstCorPlot)
# Add empty space to the right to replace Mekong and Yangtze
sstCorPlot2 <- plot_grid(sstCorPlot + theme(legend.position = 'none'), NULL, ncol = 2)
instCorPlot <- ggdraw(sstCorPlot2 +
theme(plot.margin = margin(l = 0.7, r = 0.2, unit = 'cm'))) +
draw_label('Decaying ENSO', x = 0.02, y = 0.75, angle = 90, size = 10) +
draw_label('Ongoing ENSO', x = 0.02, y = 0.3, angle = 90, size = 10)
pl <- lapply(c(1855, 1905, 1955), function(x) {
p <- plot_sst_cor(sstCor[period == x], sstSignifArea[period == x], sstLand,
limits = limits) +
labs(title = periodLab(x)) +
theme(strip.text.y = element_text(size = 9),
strip.text.x = element_text(size = 11),
strip.background.x = element_blank(),
axis.line = element_blank())
ggdraw(p +
theme(legend.position = 'none',
plot.margin = margin(l = 0.7, r = 0.2, unit = 'cm'))) +
draw_label('Decaying ENSO', x = 0.02, y = 0.75, angle = 90, size = 10) +
draw_label('Ongoing ENSO', x = 0.02, y = 0.3, angle = 90, size = 10)
})
plots <- plot_grid(pl[[1]], pl[[2]], pl[[3]], instCorPlot,
ncol = 2, labels = paste0(letters[1:4], ')'))
legend_plot <- plot_grid(NULL, legend, rel_widths = c(0.5, 1))
plot_grid(plots, legend_plot, nrow = 2, rel_heights = c(15, 1))